A robust fusion method for multilingual spoken document retrieval systems employing tiered resources
نویسندگان
چکیده
In this study, we present two novel fusion approaches to merge subword and word based retrieval methods within a multilingual spoken document retrieval (SDR) system. Considering the fact that more than 6000 languages are spoken in the world today, resources (e.g., text and audio data, pronunciation lexicon) needed to develop Automatic Speech Recognition (ASR) systems for such a range of languages (accordingly the performances of these ASR systems) can be considered within a tiered structure. Even for resource-rich languages, some applications (e.g., historical digital archives) contain acoustical/lexical variations among time which presents challenges to build effective up-to-date audio indexing and retrieval systems. Within this concept, we focus on creating robust multilingual SDR systems employing both word-based and subword-based retrieval methods. Our proposed algorithms employ an OOV-word detection module to generate hybrid transcripts/lattices. In our Dynamic Fusion (DF) approach, hybrid transcripts/lattices are used to assign dynamic fusion weights to each subsystem. In our Hybrid Fusion (HF) approach, queries are searched through hybrid lattices. We evaluated our proposed algorithms in a proper name retrieval task within the Spanish Broadcast News domain, and spoken document retrieval task using our historical speech archive NGSW corpus [1], where the proposed algorithms yield improvements over traditional fusion methods.
منابع مشابه
Robust Spoken Document Retrieval Based on Multilingual Subphonetic Segment Recognition
This paper describes the development and application of a subphonetic segment recognition system for spoken document retrieval. Following from the development of an open-vocabulary spoken document retrieval system, where the retrieval process is accomplished in the symbolic domain by measuring the distance between the parts of subphonetic segment results from pattern recognition in the acoustic...
متن کاملUniversity of Chicago at CLEF2004: Cross-language Text and Spoken Document Retrieval
The University of Chicago participated in the Cross-Language Evaluation Forum 2004 (CLEF2004) cross-language multilingual, bilingual, and spoken language tracks. Cross-language experiments focused on meeting the challenges of new languages with freely available resources. We found that modest e ectiveness could be achieved with the additional application of pseudo-relevance feedback to overcome...
متن کاملREINA at CLEF 2006 Robust Task: Local Query Expansion Using Term Windows for Robust Retrieval
This paper describes our work at CLEF 2006 Robust task. This task is an ad-hoc task that explores methods for stable retrieval by focusing on poorly performing topics. We have realized experiments for all subtask: monolingual (EN, ES, FR and IT), bilingual (IT→ES) and multilingual (ES→[EN ES FR IT]) retrieval. For monolingual retrieval we have focused our work on local query expansion, i.e. usi...
متن کاملA System Architecture for Multilingual Spoken Document Retrieval
Finding audio and video resources in internet is becoming an increasingly demanded application. However, search engines are usually limited to adjacent texts (hand supplied transcripts or close captions) to index and classify multimedia documents. Clearly, a key advantage can be taken from using automatic speech recognition and natural language processing technologies, since they allow to trans...
متن کاملOpen-Vocabulary Spoken Document Retrieval Based On Multilingual Subphonetic Segment Recognition
This paper describes the development and application of a subphonetic segment recognition system for spoken document retrieval. Following from the development of an open-vocabulary spoken document retrieval system, where the retrieval process is accomplished in the symbolic domain by measuring the distance between the parts of subphonetic segment results from pattern recognition in the acoustic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006